Skip to content

Conversation

@satvshr
Copy link
Contributor

@satvshr satvshr commented Jan 9, 2026

Metadata

@codecov-commenter
Copy link

codecov-commenter commented Jan 9, 2026

Codecov Report

❌ Patch coverage is 39.76261% with 203 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.94%. Comparing base (99928f8) to head (1b19c08).

Files with missing lines Patch % Lines
openml/_api/resources/tasks.py 13.01% 147 Missing ⚠️
openml/_api/http/client.py 46.37% 37 Missing ⚠️
openml/_api/runtime/core.py 77.77% 6 Missing ⚠️
openml/_api/runtime/fallback.py 0.00% 6 Missing ⚠️
openml/_api/resources/datasets.py 77.77% 2 Missing ⚠️
openml/tasks/functions.py 60.00% 2 Missing ⚠️
openml/_api/__init__.py 75.00% 1 Missing ⚠️
openml/_api/config.py 96.87% 1 Missing ⚠️
openml/tasks/task.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1611      +/-   ##
==========================================
- Coverage   52.75%   44.94%   -7.82%     
==========================================
  Files          36       46      +10     
  Lines        4333     4508     +175     
==========================================
- Hits         2286     2026     -260     
- Misses       2047     2482     +435     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@geetu040 geetu040 mentioned this pull request Jan 9, 2026
25 tasks
@satvshr satvshr marked this pull request as ready for review January 12, 2026 15:29
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a high-level review, I noticed a few points that need adjustment:

  • Caching can likely be removed from the SDK, since these concerns should be handled by the base client.
  • I don't see the api_context being used in tasks/functions, so it's not clear to me how the SDK is actually using the new API interface here.
  • Instead of moving entire methods out of tasks/functions.py, it would be better to stick to the goal of minimal SDK changes while enabling v2 support.
  • API calls should be updated at the specific root functions (for example _get_task_description, OpenMLTask._download_split).
  • For listing tasks, please follow the approach discussed in #1575 comment.

)

print(evals_setups.head(10))
print(evals_setups.head(10))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep these changes away from this PR. If there are some ruff errors in the existing code, they should be fixed in another PR which will probably get merged soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidentally had ran ruff format . on this branch, ruff PR getting merged solved these issues automatically though.

@satvshr satvshr marked this pull request as draft January 14, 2026 20:25
@satvshr satvshr changed the title [ENH] Tasks Migration [ENH] V1 → V2 API Migration - Tasks Jan 15, 2026
Copy link
Collaborator

@geetu040 geetu040 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some comments, please take a look and make sure the signature of all methods in TasksAPI, TasksV1 and TasksV2 stay same.

def get(self, dataset_id: int) -> OpenMLDataset | tuple[OpenMLDataset, Response]: ...


class TasksAPI(ResourceAPI, ABC):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the methods commented out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to remove them, if I add abstract methods they have to be for shared functions right? The only shared function right now is get.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I add abstract methods they have to be for shared functions right?

  1. they create blueprint of this resource, so one can look at the resource class to see which are the public methods and what do their inputs and outputs look like
  2. these methods are expected to be implemented in all the child classes, so yes they are used for shared functions

The only shared function right now is get.

list, delete, ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list, delete, ...?

Not there for v2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still the base class should have these, in the v2 class just raise an exception or maybe skip it and the exception will be raised automatically

Comment on lines +69 to +77
# @abstractmethod
# def list_tasks(
# self,
# *,
# task_type: TaskType | None = None,
# offset: int | None = None,
# size: int | None = None,
# **filters: Any,
# ):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method should simply be called list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the name in TasksV1 (where the function actually exists)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signatures should be same for all 3 classes as of now: #1611 (review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, in terms of functionality only get matches, else i'd have done that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand



class TasksV1(TasksAPI):
@openml.utils.thread_safe_if_oslo_installed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this, it's not needed, it's related to cache and should be handled at client

Comment on lines 27 to 80
def get(
self,
task_id: int,
download_splits: bool = False, # noqa: FBT002
**get_dataset_kwargs: Any,
) -> OpenMLTask:
"""Download OpenML task for a given task ID.
Downloads the task representation.
Use the `download_splits` parameter to control whether the splits are downloaded.
Moreover, you may pass additional parameter (args or kwargs) that are passed to
:meth:`openml.datasets.get_dataset`.
Parameters
----------
task_id : int
The OpenML task id of the task to download.
download_splits: bool (default=False)
Whether to download the splits as well.
get_dataset_kwargs :
Args and kwargs can be used pass optional parameters to
:meth:`openml.datasets.get_dataset`.
Returns
-------
task: OpenMLTask
"""
if not isinstance(task_id, int):
raise TypeError(f"Task id should be integer, is {type(task_id)}")

task = self._get_task_description(task_id)
dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
# List of class labels available in dataset description
# Including class labels as part of task meta data handles
# the case where data download was initially disabled
if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
task.class_labels = dataset.retrieve_class_labels(task.target_name)
# Clustering tasks do not have class labels
# and do not offer download_split
if download_splits and isinstance(task, OpenMLSupervisedTask):
task.download_split()

return task

def _get_task_description(self, task_id: int) -> OpenMLTask:
result = self._http.get(f"task/{task_id}", return_response=True)

if isinstance(result, tuple):
task, _response = result
else:
task = result

return task
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not copy this entirely from tasks/functions.py, only the specific part which loads the task object should be here, which would probably be

       response = self._http.get(f"task/{task_id}")
       task = self._create_task_from_xml(response.text)
       return task

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to highlight the entire get function or only _get_task_description? Is this:

        dataset = get_dataset(task.dataset_id, **get_dataset_kwargs)
        # List of class labels available in dataset description
        # Including class labels as part of task meta data handles
        #   the case where data download was initially disabled
        if isinstance(task, (OpenMLClassificationTask, OpenMLLearningCurveTask)):
            task.class_labels = dataset.retrieve_class_labels(task.target_name)
        # Clustering tasks do not have class labels
        # and do not offer download_split
        if download_splits and isinstance(task, OpenMLSupervisedTask):
            task.download_split()

not useful? Why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to highlight the entire get function or only _get_task_description? Is this:

both

not useful? Why?

what should I look here? this is dataset related code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should I look here? this is dataset related code.

It is assigning attributes to the task object, don't you think that's useful?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I see, you are asking if this should live in the sdk or the resource class? if it can stay out of the resource class then it should


return self.__list_tasks(api_call=api_call)

def __list_tasks(self, api_call: str) -> pd.DataFrame: # noqa: C901, PLR0912
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe use better helper functions like _create_list_url and _parse_list_response?

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused as to what youre trying to say here, do you mean I should transfer the functionalities of list (previously list_tasks) to _create_list_url, and rename __list_tasks to _parse_list_response?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works fine but __list_tasks is not a good name for a helper function in this class.
I'd suggest you take a look at datasets PR, it does something similar: #1608

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok only a rename? Will do.

Comment on lines 366 to 371
def get_tasks(
self,
task_ids: list[int],
download_data: bool | None = None,
download_qualities: bool | None = None,
) -> list[OpenMLTask]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this method in tasks/functions.py, because we are sticking to the rule "minimal sdk changes for v1/v2 compatibility"

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't I do the same for create_task and delete_task too?

Edit: Just saw your comment below :D

Comment on lines 414 to 422
def create_task(
self,
task_type: TaskType,
dataset_id: int,
estimation_procedure_id: int,
target_name: str | None = None,
evaluation_measure: str | None = None,
**kwargs: Any,
) -> (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should stay in tasks/fucntions.py

bool
True if the deletion was successful. False otherwise.
"""
return openml.utils._delete_entity("task", task_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll implement this in the base class that you can replace with later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed today during the standup, makes sense


return cls(**common_kwargs)

def list_task_types(self) -> list[dict[str, str | int | None]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used anywhere?

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, there is an endpoint for it though, same for get_task_type.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say just remove it, since it suits a TaskType resource, though it's not needed anywhere now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tasktype as part of this PR too, there is only one endpointt for Tasks and 2 for TaskType, so I just added it into Tasks given theyre clubbed together in the docs for endpoints too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theyre clubbed together in the docs for endpoints too

in v2 if I remember correctly, this is not the case. anyways I still think they should be removed as they are not being used anywhere in the sdk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about v2 only, tasks and tasktype share the same header in the v2 docs page. tasktype endings are not being used anywhere so should tasktypes just not exist? Want me to remove it from this PR and park it into a draft PR if it is ever used (which will probably get lost over time), or should we let the endpoints and functions to de-serialize those endpoints exist for no reason?

raise OpenMLCacheException(f"Task file for tid {tid} not cached") from e


def _get_estimation_procedure_list() -> list[dict[str, Any]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this method here and inside try to use the method list_estimation_procedures already implemented in evaluations/functions.py

Copy link
Contributor Author

@satvshr satvshr Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_estimation_procedures returns only the "oml:name" whereas _get_estimation_procedure_list requires more items, they make call the same API, and list_estimation_procedures may be somewhat of a subset of _get_estimation_procedure_list, but that does not mean it can be used inside _get_estimation_procedure_list

@satvshr
Copy link
Contributor Author

satvshr commented Jan 16, 2026

@geetu040 I will explain the confusion I am facing to the best of my abilities, and I do feel communicating on the PR threads is yielding no results, hence I will just put everything here:

  1. Over here you say the base class should have v1 functions like list, delete, and create. This means to me that all V1 related functions should be moved from functions.py to TasksV1 (from SDK to resources) and calls inside functions.py should be replaced with calls to api_context.backend.task.method. Over here, you say something similar, stating that TasksV1, V2, and TasksAPI should have the same function signatures.
  2. Over here and here you mention that we should try to keep most of the code in SDK and not resources (contradictory to point 1).
    Completely contrary to point 1 you mention get_tasks along with create_task and delete_task should stay in sdk over here.

I seem to be getting 2 contradictory messages from your end which is where the confusion arises.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants